ExoPlayer 2 - Track selection

Olly Woodman
AndroidX Media3
Published in
5 min readJun 16, 2016

--

When a piece of media contains multiple tracks of a given type, for example multiple video tracks in different qualities or multiple audio tracks in different languages, ExoPlayer has to select which track of each type it wishes to play. For adaptive video playbacks, ExoPlayer has to select the subset of video tracks that it wishes to play, and then switch between the selected tracks during playback according to some adaptive policy. Even for media that contains only a single track, ExoPlayer has to determine whether it supports the format in which the track is encoded, and what action to take if it does not. These are all examples of track selection problems; an area that we’re rethinking in ExoPlayer 2.

In order to perform a track selection it’s necessary to consider:

  • The tracks that are present in the media and their sample formats. Here “sample format” means both the type of the format (e.g. H.264) and its properties (e.g. the resolution and frame-rate).
  • Track selection flags specified in the media itself. For example the Matroska container format supports a flag to indicate that a track should be selected by default given no other information (FlagDefault), as well as a flag to indicate that a track should always be selected (FlagForced). The HLS specification defines DEFAULT, FORCED and AUTOSELECT tags that can be used in HLS master playlists to achieve similar results.
  • The codec and performance limitations of the TrackRenderers that will render the media. The MediaCodecVideoTrackRenderer and MediaCodecAudioTrackRenderer classes provided by ExoPlayer are most commonly used. In this case the limitations depend on the capabilities of the underlying decoders provided by the platform, which in turn depend on the capabilities of the device itself.
  • Application specific requirements. For example, an application may wish to restrict the quality of video that’s played when streaming over a mobile network, even if there’s sufficient bandwidth to play a higher quality format.
  • The preferences of the user. For example, if a piece of media contains English and German audio tracks, the application may use knowledge of the user’s language preferences to select the appropriate track. One nuance when considering user preferences is that the selection for one media type (e.g. the subtitle track) is often not independent to the selection for another (e.g. the audio track). An English user may prefer to select the original audio track when playing a movie, and want to enable the English subtitle only in the case that the selected audio track is in a foreign language.

ExoPlayer 1 suffers from a number of shortcomings when considering the points above.

  • The tracks that are present in the media can be queried via ExoPlayer’s getTrackCount() and getTrackFormat() APIs, however only supported tracks are exposed. This means that when playing a piece of media containing only AC-3 audio on a device that lacks an AC-3 decoder, an application has no way of determining that the media contains audio at all. This is information that an application might like to know, since it may be appropriate to display an error to the user in this case.
  • Flags specified in the media relating to track selection are ignored.
  • The codec and performance limitations of the TrackRenderers are taken into account more rigorously for DASH, SmoothStreaming and HLS playbacks than for playbacks of other types, and in a way that assumes ExoPlayer’s standard TrackRenderer classes are being used even though this may not be the case.
  • ExoPlayer’s setSelectedTrack() API can be called to select tracks to be played, allowing the preferences of the application developer and user to be applied. Unfortunately the asynchronous nature of this API makes it inefficient to use. ExoPlayer uses an internal thread to play and buffer media. When the tracks in a piece of media are first determined, a message is posted from this thread to an application thread to allow the tracks to be queried. When the application then calls setSelectedTrack(), a message is posted back to ExoPlayer’s internal thread and the selection is applied. Depending on the type of media being played, this may result in media being re-buffered. The posting of messages between threads takes time, particularly if the application thread is busy (e.g. inflating resources). Hence it introduces a delay between the tracks being determined and the player starting to buffer the correct selection of tracks for playback. If it were possible to reduce or eliminate this delay then ExoPlayer would be able to start buffering the correct tracks earlier, resulting in a snappier playback experience.
  • For adaptive video playback, ExoPlayer exposes a special adaptive video track through getTrackFormat() that can be selected by calling setSelectedTrack(). Selecting this track causes the player to adapt between the supported video tracks during playback. This approach lacks flexibility, in that applications are unable to select a subset of the supported video tracks for the player to adapt between.

In ExoPlayer 2 we’re introduce a new model for track selection that addresses the points described above. Since ExoPlayer 2 is still under development some details may change, however at a high level:

  • Each SampleSource will expose all of the tracks that are present in the media, regardless of whether the device or TrackRenderers being used are able to play them. Any flags specified in the media relating to track selection will be represented in the Format objects describing the tracks.
  • When the available tracks have been determined they will be passed along with the TrackRenderers to a new TrackSelector interface, whose job will be to select tracks to be played by each TrackRenderer. This will happen synchronously on ExoPlayer’s internal thread, eliminating the delay caused by posting of messages between threads in ExoPlayer 1.
  • The TrackSelector API will be designed in a way that doesn’t assume the track selection for one TrackRenderer can be made independently to the selection for another. This allows TrackSelectors to implement complex logic, such as selecting a subtitle track only if the selected audio track is not in the user’s preferred language.
  • The ExoPlayer library will provide a configurable TrackSelector implementation called DefaultTrackSelector, that we envisage will be suitable for nearly all use cases. Custom TrackSelector implementations will also be possible.
  • To take the codec and performance limitations of the TrackRenderers into account, each TrackRenderer will provide functionality for querying the extent to which it’s able to play a given track, and also the extent to which it’s able to adapt between multiple tracks of the same type. TrackSelector implementations will invoke this functionality to take TrackRenderer limitations into account in a consistent way for all types of media.
  • For adaptive playbacks, rather than having the SampleSource expose a special adaptive track, a TrackSelector will instead be able to select multiple individual tracks for playback by a single TrackRenderer. Hence an application will be able to specify an arbitrary subset of video tracks to use for adaptive playback. This functionality will be demonstrated in the ExoPlayer 2 demo application, as shown below.
Selecting a subset of video tracks for adaptive playback
  • DefaultTrackSelector will expose all of the tracks in the media to the application, including those that cannot be played by any of the player’s TrackRenderers. This will allow applications to determine when there are unsupported tracks within a piece of media.

Track selection is a complicated problem. We hope that with ExoPlayer 2 we’ll be able to support more flexible, correct and efficient track selection than is possible today.

--

--